SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments

نویسندگان

  • Torbjørn Rognes
  • Erling Seeberg
چکیده

MOTIVATION Optimal sequence alignment based on the Smith-Waterman algorithm is usually too computationally demanding to be practical for searching large sequence databases. Heuristic programs like FASTA and BLAST have been developed which run much faster, but at the expense of sensitivity. RESULTS In an effort to approximate the sensitivity of an optimal alignment algorithm, a new algorithm has been devised for the computation of a gapped alignment of two sequences. After scanning for high-scoring words and extensions of these to form fragments of similarity, the algorithm uses dynamic programming to build an accurate alignment based on the fragments initially identified. The algorithm has been implemented in a program called SALSA and the performance has been evaluated on a set of test sequences. The sensitivity was found to be close to the Smith-Waterman algorithm, while the speed was similar to FASTA (ktup = 2). AVAILABILITY Searches can be performed from the SALSA homepage at http://dna.uio.no/salsa/ using a wide range of databases. Source code and precompiled executables are also available. CONTACT [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gapped BLAST and PSLBLAST: a new generation of protein database search programs

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension ...

متن کامل

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of...

متن کامل

Comparative accuracy of methods for protein sequence similarity search

MOTIVATION Searching a protein sequence database for homologs is a powerful tool for discovering the structure and function of a sequence. Two new methods for searching sequence databases have recently been described: Probabilistic Smith-Waterman (PSW), which is based on Hidden Markov models for a single sequence using a standard scoring matrix, and a new version of BLAST (WU-BLAST2), which use...

متن کامل

PowerBLAST: a new network BLAST application for interactive or automated sequence analysis and annotation.

As the rate of DNA sequencing increases, analysis by sequence similarity search will need to become much more efficient in terms of sensitivity, specificity, automation potential, and consistency in annotation. PowerBLAST was developed, in part, to address these problems. PowerBLAST includes a number of options for masking repetitive elements and low complexity subsequences. It also has the cap...

متن کامل

A DNA based Approach to find Closed Repetitive Gapped Subsequences from a Sequence Database

In bioinformatics, the discovery of transcription factor binding affinities is important. This is done by sequence analysis of micro array data. The determination of continuous and gapped motifs accurately from the given long sequence of data, say genetic data is challenging and requires a detailed study. In this paper, we propose an algorithm that can be used for finding short continuous, shor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 14 10  شماره 

صفحات  -

تاریخ انتشار 1998